


% 



r 



h >h... »- Ukat»^ 






a«ai 



DOCUMENT RESUME 

ED 030 981 24 EA 002 408 

By -Harris, Chester 

Comments on Professor Wiley’s Paper Entitled "Design and Analysis of Evaluation Studies." 

California Univ., Los Angeles. Center for the Study of Evaluation of Instructional Programs. 

Spons Agency- Office of Education (DHEW), Washington, D.C. Bureau of Research. 

Report No-CSE-R-29 
Bureau No-BR-6- 1646 
Pub Date May 69 
Contract-OEC-4-6-061646- 1909 

Note-7p.; From the Proceedings of the Symposium on Problems in the Evaluation of Instruction (Los Anoeles. 
December, 1967). 

EDRS Price MF-S0.25 HC-S0.45 

Descriptors- Data Analysis, *Evaluation Techniques, Hypothesis Testing Input Output Analysis, * Instructional 
Programs, * Methodology, * Research Design 

Three critical issues in the design and analysis of evaluation studies suggested 
at the conference are (1) the univariate versus multivariate dependent; variable 
studies, (2) the choice of a response surface design over the conventional fixed 
model, and (3) the tendency to interpret every study as if it were being done for the 
first time. Taking into account prior information is a step that would most improve the 
design and analysis of evaluation studies. Related documents are EA 002 409 and EA 
002 535 (MLF) 






\mmfi > «■ i ml [\j i 






r* i y, . .M ii ji P.u i n* 






o 

LERLC 



CO-DIRECTORS 



Merlin C. Wittrock Erick L. Lindman 

ASSOCIATE DIRECTORS 

Marvin C. Alkin Frank Massey, Jr. C. Robert Pace 



The CENTER FOR THE STUDY OF EVALUATION OF INSTRUCTIONAL 
PROGRAMS is engaged in research that will yield new ideas 
and new tools capable of analyzing and evaluating instruc- 
tion. Staff members are creating new ways to evaluate con- 
tent of curricula, methods of teaching and the multiple 
effects of both on students. The CENTER is unique because 
of its access to Southern California’s elementary, second- 
ary and higher schools of diverse socio-economic levels 
and cultural backgrounds. 



o 

ERIC 



( 3 $- £> — */4 
PA'i-H 
0 



Chester Harris 
University of Wisconsin 



From the Proceedings of the 

SYMPOSIUM ON PROBLEMS IN THE EVALUATION OF INSTRUCTION 

University of California, Los Angeles 
December, 1967 

M. C. Wittrock, Chairman 

Sponsored by the Center for the 
Study of Evaluation 



The research and development reported herein was 
performed pursuant to a contract with the United 
States Department of Healthy Education , and Wel- 
fare, Office of Education under the provisions of 
the Cooperative Research Program. 



CSE Report No. 29, May, 1969 
University of California, Los Angeles 

U.S. DEPARTMENT OF HEALTH. EDUCATION & WELFARE 
OFFICE OF EDUCATION 

THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE 
PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 
POSITION OR POLICY. 



o 

ERIC 



CO 

o 

u 

o 

o 

<1 

w 



oo 

O' 

o 

K\ 

o 

Q 

LU 



CaMS ON PROFESSOR WILEY'S paper entitled "DESIGN 
AND ANALYSIS OF EVALUATION STUDIES" 




COMMENTS ON PROFESSOR tvTLEY’S PAPER ENTI TLED 
“DESIGN AND ANALYSIS OF EVALUATION STUDIES” 

Chester Harris 

We have come into the third day of this conference, and enough 
things have been said in various contexts to make it possible for 
me to point out some things that bear in general on Mr. Wiley 1 s 
paper, but still more generally on the whole set of papers. 

I think that the most important contribution that can be made 
at this point in the conference is to identify and enumerate what 
I regard as three critical issues in the design and analysis of 
evaluation studies suggested in these papers and discussions. The 
area of design and analysis is actively changing and developing, 
and most of us would be hard pressed to predict the extent to 
which these issues will be resolved or reformulated in the near 
future. The measurement problem in evaluation studies involves 
a situation in which we have an instructional package that is to 
be used with some group of human subjects, and then evaluated in 
terms of how good it is. This demands that we adopt some scheme 
for specifying what we mean by “good." 

There appear to be three types of “goodness" for those who 
take the behavior of students as the relevant evidence. One is 
goodness defined as a level of performance; a second is goodness 
defined as change of performance in a specified direction; and a 



o 

ERIC— 






2 



third is goodness defined as change of performance in a specified 
direction to a specified extent. Buried here are the questions of 
which behaviors are relevant and whether the observations that 
are made can become bases for inferences regarding learning as a 
result of the instructional package. This is an issue which Dr. 
Gagn6 posed for us earlier in the session. These three attitudes 
imply somewhat different measurement operations for any chosen type 
of performance. Let us leave this with the further acknowledgment 
that in any study many different types of performance may be re- 
garded as important dependent variables, and that the amount of 
work required to make preparations for an evaluation study may be 
extensive. 

The reality that there may be relevant dependent variables 
also suggests that appropriate designs for evaluation probably 
should be multivariate. This is the first issue which I wish to 
identify, the issue of univariate versus multivariate dependent 
variable studies. My strategy is not to resolve the issue but 
merely to enumerate the factors involved. 

Possibly the simplest design for an evaluation study is that 
which employs only one instructional package and attempts to assess 
its goodness for two or more categories or types of students. Here 
we employ stratifying variables: age, sex, intelligence level, 

residential region, etc., to define our groups of students, and 







!» iSnSft . i rt i r 



3 




then compare and contrast the various student perfoxmances . The 
intent of such a study is primarily descriptive (though tests of 
significance often are run): to define the goodness of the in- 

structional package with respect to specified groups. This is 
a fixed-effects model, with the chosen levels of the stratifying 
variables being the only ones about which information is gained. 

Here there arises an issue which I will describe by extending the 
design so that more than one instructional package is used. I 
assume that we may retain one or more stratifying variables as well, 
and thus have a reasonably complicated design. I will not, however, 
complicate it by introducing repeated measurements . Such a design 
has as its intent a comparison among instructional packages for 
various groups and sub-groups . I repeat that in practice this is 
a fixed model; for we seem absolutely unable to define a population 
of instructional packages, and, even if we could, to be quite un- 
willing to select at random a set of instructional packages to study. 
Instead, we select the packages arbitrarily and deliberately; this is 
a fixed effect. 

A design such as this has limitations that are inherent in all 
hypothesis testing. Among them is the familiar problem posed by the 
reasonable assertion that no sharp hypothesis can possibly be true. 
Testing such a hypothesis is merely an exercise in testmanship since 
the outcome depends heavily upon the manipular flexibility of the 
test. 



It is perfectly reasonable to assert that no two instructional 
packages can possibly have identically the same effect; thus the 
testing of the hypothesis that two or more such packages have the 
same mean effect can be viewed as relatively unimportant. This 
represents my attitude toward the decision theoretic approach 
which has been mentioned over and over again at this conference. 

Those \dio criticize hypothesis testing urge that we use esti- 
mation procedures instead. The question of what kind of estimation 
procedure is useful here is an important one. Some interest exists 
in developing an analogue of response surface methodology for 
evaluation studies. It is an analogue, since the elements of instruc- 
tion packages that can be identified often exist in only a few 
discrete rather than continuously ordered forms. This creates some 
problems with the statistics, but in time these problems may be made 
manageable. 

The response surface design attempts to vary inputs (elements 
of instruction) to the end of identifying an optimum or maximum 
output performance. This is quite a different approach to evalua- 
tion studies. The choice of this approach as opposed to the more 
conventional fixed model constitutes a second important issue. 

Let me raise a third issue which is often associated with a 
Bayesian point of view in statistics. The fact that we tend to 
interpret every study as if it were being done for the first time 



should make us uneasy, even though we still can not agree on how 
prior information should be incorporated into our analysis. Ac- 
tually, there often are relevant prior findings that remain un- 
used. 

I am reminded of how we behave in directing dissertations. 

We always insist on a summary of previous findings in an early 
chapter, but we would be horrified if the student tried to inte- 
grate them numerically with his findings. The issue here is the 
extent to which, in any evaluation study, the design and analysis 
will ignore all the possible prior distributions. 

A modification in practice- -namely, learning to take into 
account the prior inf oimation- -might be the one that would most 
improve the design and analysis of evaluation studies. 



